Generalized exponential distribution (genexpon)#
The generalized exponential distribution in SciPy (scipy.stats.genexpon) is a continuous distribution on \([0,\infty)\) with an especially simple hazard rate:
So the instantaneous failure/event rate starts at \(h(0)=a\) and increases smoothly toward the asymptote \(h(\infty)=a+b\).
Learning goals#
write down the PDF/CDF/survival/hazard (including SciPy’s
loc/scaleform)understand how \((a,b,c)\) control the hazard and the tail
compute moments via a convergent series and connect them to SciPy’s
statssample in a NumPy-only way via inverse CDF (bisection)
visualize PDF/CDF and Monte Carlo samples, and fit the model in SciPy
import math
import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from scipy import stats
from scipy.special import gammaln
from scipy.stats import genexpon as genexpon_sp
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(7)
# A default parameter set used throughout
A0, B0, C0 = 1.2, 0.7, 2.0
1) Title & classification#
Name:
genexpon(generalized exponential; SciPy’sscipy.stats.genexpon)Type: continuous distribution
Standard support: \(x \in [0,\infty)\)
Parameter space (shape parameters): \(a>0\), \(b>0\), \(c>0\)
SciPy also provides a location–scale form:
location: \(\mathrm{loc} \in \mathbb{R}\)
scale: \(\mathrm{scale} > 0\)
support: \(x \in [\mathrm{loc},\infty)\)
with the standardization \(z=(x-\mathrm{loc})/\mathrm{scale}\).
2) Intuition & motivation#
Hazard-rate view (why genexpon is nice)#
For a nonnegative continuous random variable \(X\) with PDF \(f\) and CDF \(F\), the hazard rate is
For genexpon, the hazard has the simple form
Interpretation:
\(a\) is the baseline event rate at time 0.
\(b\) is an additional rate that “turns on” over time.
\(c\) controls how quickly the hazard transitions from \(a\) to \(a+b\).
So genexpon is a flexible replacement for an exponential waiting time when the “memoryless / constant hazard” assumption is too rigid.
Typical real-world use cases#
Reliability / survival analysis: time-to-failure with an increasing hazard that saturates (risk increases with age but levels off).
Event-time modeling: time-to-churn, time-to-conversion, time-to-default when risk ramps up then stabilizes.
Simulation: bounded, increasing intensity models in renewal/queueing-style simulations.
Relations to other distributions#
Exponential: if \(b=0\), then \(h(x)=a\) and \(X\sim\mathrm{Exponential}(\text{rate}=a)\).
Exponential with higher rate: as \(c\to\infty\), \(h(x)\) transitions rapidly to \(a+b\), and the distribution approaches \(\mathrm{Exponential}(\text{rate}=a+b)\).
Weibull-like near 0: for small \(c\), \(1-e^{-cx}\approx cx\), so \(h(x)\approx a + (bc)x\) (a “linear hazard” shape, like Weibull with shape near 2 when \(a\) is small).
3) Formal definition#
Let \(a,b,c>0\). The standardized genexpon distribution has support \(x\ge 0\) and PDF
Define the survival function \(S(x)=1-F(x)\). A key identity is that genexpon has an explicit survival function:
So the CDF is
The hazard rate becomes
SciPy location–scale parameterization#
SciPy uses the standard distribution plus loc and scale:
With \(z=(x-\mathrm{loc})/\mathrm{scale}\),
In particular, the support is \(x\ge\mathrm{loc}\).
def _validate_genexpon_params(a, b, c, scale):
if not (a > 0 and b > 0 and c > 0):
raise ValueError("a, b, c must be > 0")
if not (scale > 0):
raise ValueError("scale must be > 0")
def genexpon_logsf(x, a, b, c, loc=0.0, scale=1.0):
'''Log survival log P(X > x) for genexpon in SciPy's loc/scale form (NumPy-only).'''
_validate_genexpon_params(a, b, c, scale)
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
out = np.zeros_like(z, dtype=float) # logsf = 0 for z < 0 (sf = 1)
mask = z >= 0
zz = z[mask]
one_minus_exp = -np.expm1(-c * zz) # 1 - exp(-c z)
out[mask] = (-a - b) * zz + (b / c) * one_minus_exp
return out
def genexpon_sf(x, a, b, c, loc=0.0, scale=1.0):
'''Survival function P(X > x) (NumPy-only).'''
return np.exp(genexpon_logsf(x, a, b, c, loc=loc, scale=scale))
def genexpon_cdf(x, a, b, c, loc=0.0, scale=1.0):
'''CDF P(X <= x) computed stably as 1 - exp(logsf) (NumPy-only).'''
logsf = genexpon_logsf(x, a, b, c, loc=loc, scale=scale)
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
out = np.zeros_like(z, dtype=float)
mask = z >= 0
out[mask] = -np.expm1(logsf[mask])
return out
def genexpon_logpdf(x, a, b, c, loc=0.0, scale=1.0):
'''Log PDF in SciPy's loc/scale form (NumPy-only).'''
_validate_genexpon_params(a, b, c, scale)
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
out = np.full_like(z, -np.inf, dtype=float)
mask = z >= 0
zz = z[mask]
one_minus_exp = -np.expm1(-c * zz)
hazard = a + b * one_minus_exp
out[mask] = np.log(hazard) + (-a - b) * zz + (b / c) * one_minus_exp - np.log(scale)
return out
def genexpon_pdf(x, a, b, c, loc=0.0, scale=1.0):
'''PDF in SciPy's loc/scale form (NumPy-only).'''
return np.exp(genexpon_logpdf(x, a, b, c, loc=loc, scale=scale))
def genexpon_hazard(x, a, b, c, loc=0.0, scale=1.0):
'''Hazard rate h(x) = f(x) / (1 - F(x)) in SciPy's loc/scale form (NumPy-only).'''
_validate_genexpon_params(a, b, c, scale)
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
out = np.zeros_like(z, dtype=float)
mask = z >= 0
zz = z[mask]
one_minus_exp = -np.expm1(-c * zz)
# hazard scales as 1/scale under x = loc + scale * z
out[mask] = (a + b * one_minus_exp) / scale
return out
4) Moments & properties#
Tail behavior and existence of moments#
Because the hazard converges to the constant \(a+b\), the tail is exponentially bounded:
So all polynomial moments exist, and the MGF exists at least for \(t<a+b\).
Raw moments (integer orders) via a convergent series#
For integer \(n\ge 1\) one convenient expression is
In particular:
mean: \(\mathbb{E}[X]=1!\,e^{b/c}\sum_{k\ge 0}\dfrac{(-b/c)^k}{k!\,(a+b+ck)}\)
second raw moment: \(\mathbb{E}[X^2]=2!\,e^{b/c}\sum_{k\ge 0}\dfrac{(-b/c)^k}{k!\,(a+b+ck)^2}\)
variance: \(\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2\)
Skewness and kurtosis can be computed from the first four raw moments.
MGF / characteristic function#
Using the same expansion (see Section 6), for \(t<a+b\):
The characteristic function is \(\varphi(\omega)=M(i\omega)\).
Entropy#
The (differential) entropy is
A simple closed form is not commonly used; in practice it is computed numerically (SciPy provides genexpon.entropy).
def genexpon_raw_moment_series(n, a, b, c, terms=5000):
'''Raw moment E[X^n] for the *standard* genexpon(a,b,c) via a convergent series.'''
if n < 1 or int(n) != n:
raise ValueError("n must be a positive integer")
n = int(n)
k = np.arange(terms)
lam = a + b + c * k
# term_k = (-b/c)^k / (k! * lam^n)
# Use log-space for numerical stability.
sign = np.where(k % 2 == 0, 1.0, -1.0)
log_abs = k * np.log(b / c) - gammaln(k + 1) - n * np.log(lam)
series = np.sum(sign * np.exp(log_abs))
return np.exp(b / c) * math.factorial(n) * series
def genexpon_mgf_series(t, a, b, c, terms=5000):
'''MGF M(t) for t < a+b (standard genexpon), via a series.'''
s = a + b
if not (t < s):
raise ValueError("MGF diverges for t >= a+b")
k = np.arange(terms)
denom = (a + b + c * k) - t
sign = np.where(k % 2 == 0, 1.0, -1.0)
log_abs = k * np.log(b / c) - gammaln(k + 1) - np.log(denom)
series = np.sum(sign * np.exp(log_abs))
return 1.0 + t * np.exp(b / c) * series
# Compare series moments to SciPy's numerical stats
params = (A0, B0, C0)
mean_sp, var_sp, skew_sp, kurt_excess_sp = genexpon_sp.stats(*params, moments="mvsk")
entropy_sp = genexpon_sp.entropy(*params)
m1 = genexpon_raw_moment_series(1, *params)
m2 = genexpon_raw_moment_series(2, *params)
m3 = genexpon_raw_moment_series(3, *params)
m4 = genexpon_raw_moment_series(4, *params)
mean_series = m1
var_series = m2 - m1**2
mu3 = m3 - 3 * m1 * m2 + 2 * m1**3
mu4 = m4 - 4 * m1 * m3 + 6 * m1**2 * m2 - 3 * m1**4
skew_series = mu3 / var_series ** 1.5
kurt_excess_series = mu4 / var_series**2 - 3
print("params (a,b,c):", params)
print("mean (series):", mean_series)
print("mean (SciPy) :", float(mean_sp))
print("var (series):", var_series)
print("var (SciPy) :", float(var_sp))
print("skew (series):", skew_series)
print("skew (SciPy) :", float(skew_sp))
print("kurtosis excess (series):", kurt_excess_series)
print("kurtosis excess (SciPy) :", float(kurt_excess_sp))
print("entropy (SciPy):", float(entropy_sp))
# MGF sanity check against Monte Carlo for a t < a+b
t = 0.4 * (A0 + B0)
mgf_series = genexpon_mgf_series(t, *params)
mc = genexpon_sp.rvs(*params, size=60_000, random_state=rng)
mgf_mc = np.mean(np.exp(t * mc))
print("\nMGF check at t=", t)
print("M(t) series:", mgf_series)
print("M(t) MC :", mgf_mc)
params (a,b,c): (1.2, 0.7, 2.0)
mean (series): 0.6330583443564004
mean (SciPy) : 0.6330583443564002
var (series): 0.32480146094947066
var (SciPy) : 0.32480146094957457
skew (series): 1.7447600281534774
skew (SciPy) : 1.7447600281939166
kurtosis excess (series): 4.621541991785324
kurtosis excess (SciPy) : 4.621541990075665
entropy (SciPy): 0.5344790035243877
MGF check at t= 0.76
M(t) series: 1.8376730717305914
M(t) MC : 1.8326085115945967
5) Parameter interpretation#
A useful way to interpret \((a,b,c)\) is through the hazard
\(a\) (baseline hazard): \(h(0)=a\). Larger \(a\) pushes mass toward 0 (shorter waiting times).
\(b\) (extra hazard that turns on): sets the asymptotic increase. As \(x\to\infty\), \(h(x)\to a+b\).
\(c\) (transition speed): how quickly the hazard rises from \(a\) to \(a+b\).
large \(c\) → fast transition (closer to an exponential with rate \(a+b\))
small \(c\) → slow transition (hazard stays near \(a\) for longer)
Below we visualize how the hazard and PDF change when we vary \(c\).
# Grids for visualization
x_max = float(genexpon_sp.ppf(0.995, A0, B0, C0))
x = np.linspace(0, x_max, 800)
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=["Hazard h(x)", "PDF f(x)"],
)
# Vary c (transition speed)
for c in [0.5, 1.0, 2.0, 6.0]:
fig.add_trace(
go.Scatter(x=x, y=genexpon_hazard(x, A0, B0, c), mode="lines", name=f"c={c}"),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(x=x, y=genexpon_pdf(x, A0, B0, c), mode="lines", name=f"c={c}"),
row=1,
col=2,
)
fig.update_layout(width=980, height=420, title="Effect of c (a and b fixed)")
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="hazard", row=1, col=1)
fig.update_yaxes(title_text="density", row=1, col=2)
fig.show()
6) Derivations#
6.1 From hazard to survival/CDF/PDF#
For a nonnegative lifetime \(X\) with hazard \(h(x)\), the survival function satisfies
For genexpon, plug in \(h(t)=a + b(1-e^{-ct})\):
Therefore
and the PDF is \(f(x)=h(x)S(x)\).
6.2 Expectation and variance#
For any nonnegative \(X\),
Write the survival as
Insert into the moment integral and integrate term-by-term:
This yields the raw-moment series used earlier:
Then
6.3 Inverse CDF (Lambert W)#
Setting \(F(x)=p\) and solving for \(x\) gives a closed form using the Lambert W function. Define \(s=a+b\) and
Then
where \(W\) is the principal real branch of Lambert W.
SciPy’s implementation uses exactly this formula (scipy.special.lambertw).
6.4 Likelihood (i.i.d. sample)#
For observations \(x_1,\dots,x_n\) with \(x_i\ge 0\), the log-likelihood is
To fit parameters numerically, it’s common to optimize over unconstrained variables, e.g.
\((\alpha,\beta,\gamma)=(\log a,\log b,\log c)\).
SciPy’s genexpon.fit performs MLE under the hood.
# Numerical sanity checks for E[X] and Var(X) via survival integrals on a grid
x_hi = float(genexpon_sp.ppf(0.9999, A0, B0, C0))
x_grid = np.linspace(0, x_hi, 80_000)
sf_grid = genexpon_sf(x_grid, A0, B0, C0)
# E[X] = ∫_0^∞ S(x) dx
mean_num = np.trapz(sf_grid, x_grid)
# E[X^2] = 2 ∫_0^∞ x S(x) dx
ex2_num = 2 * np.trapz(x_grid * sf_grid, x_grid)
var_num = ex2_num - mean_num**2
print("mean (series):", mean_series)
print("mean (grid) :", mean_num)
print("var (series):", var_series)
print("var (grid) :", var_num)
mean (series): 0.6330583443564004
mean (grid) : 0.6330057127705082
var (series): 0.32480146094947066
var (grid) : 0.32428303098588623
7) Sampling & simulation (NumPy only)#
Inverse transform sampling#
To sample from a continuous distribution, a standard method is:
Draw \(U\sim\mathrm{Uniform}(0,1)\).
Return \(X = F^{-1}(U)\).
For genexpon, \(F\) is explicit, and \(F^{-1}\) has a Lambert-W closed form.
Why use bisection here?#
If you want NumPy-only sampling (no special functions), you can still invert \(F\) numerically. Because \(F\) is monotone increasing, bisection is robust:
Maintain an interval \([\ell, u]\) such that \(F(\ell)\le p \le F(u)\).
Repeatedly cut the interval in half until the desired tolerance.
A convenient guaranteed upper bound uses the fact that \(h(x)\ge a\) for all \(x\ge 0\), so \(S(x)\le e^{-ax}\) and therefore
So choosing
guarantees \(F(u)\ge p\).
def genexpon_ppf_bisect(p, a, b, c, loc=0.0, scale=1.0, max_iter=70):
'''Percent point function (inverse CDF) via bisection (NumPy-only).'''
_validate_genexpon_params(a, b, c, scale)
p = np.asarray(p, dtype=float)
# Handle edge cases
out = np.empty_like(p)
out[p <= 0] = loc
out[p >= 1] = np.inf
mask = (p > 0) & (p < 1)
if not np.any(mask):
return out
pp = p[mask]
# Work in standardized coordinates z >= 0
lo = np.zeros_like(pp)
hi = -np.log1p(-pp) / a # guaranteed bracket due to S(x) <= exp(-a x)
for _ in range(max_iter):
mid = 0.5 * (lo + hi)
cdf_mid = genexpon_cdf(mid, a, b, c)
left = cdf_mid < pp
lo[left] = mid[left]
hi[~left] = mid[~left]
out[mask] = loc + scale * hi
return out
def sample_genexpon(size, a, b, c, loc=0.0, scale=1.0, rng=None):
'''Sample from genexpon(a,b,c,loc,scale) using NumPy-only inverse-CDF sampling.'''
if rng is None:
rng = np.random.default_rng()
u = rng.uniform(0.0, 1.0, size=size)
return genexpon_ppf_bisect(u, a, b, c, loc=loc, scale=scale)
n = 20_000
samples = sample_genexpon(n, A0, B0, C0, rng=rng)
print("sample mean:", samples.mean())
print("sample var :", samples.var())
print("theory mean (series):", mean_series)
print("theory var (series):", var_series)
sample mean: 0.630914735153167
sample var : 0.3263278688475253
theory mean (series): 0.6330583443564004
theory var (series): 0.32480146094947066
8) Visualization#
Below are:
the analytic PDF and CDF
a Monte Carlo histogram overlaid with the PDF
x_max = float(genexpon_ppf_bisect(0.999, A0, B0, C0))
x_grid = np.linspace(0, x_max, 800)
pdf_grid = genexpon_pdf(x_grid, A0, B0, C0)
cdf_grid = genexpon_cdf(x_grid, A0, B0, C0)
fig = make_subplots(
rows=1,
cols=3,
subplot_titles=["PDF", "CDF", "Samples (hist) + PDF"],
)
fig.add_trace(go.Scatter(x=x_grid, y=pdf_grid, mode="lines", name="pdf"), row=1, col=1)
fig.add_trace(go.Scatter(x=x_grid, y=cdf_grid, mode="lines", name="cdf"), row=1, col=2)
fig.add_trace(
go.Histogram(
x=samples,
nbinsx=70,
histnorm="probability density",
name="samples",
opacity=0.6,
),
row=1,
col=3,
)
fig.add_trace(go.Scatter(x=x_grid, y=pdf_grid, mode="lines", name="pdf"), row=1, col=3)
fig.update_layout(width=1100, height=420, title=f"genexpon PDF/CDF + samples (a={A0}, b={B0}, c={C0})")
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_xaxes(title_text="x", row=1, col=3)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_yaxes(title_text="probability", row=1, col=2)
fig.update_yaxes(title_text="density", row=1, col=3)
fig.show()
9) SciPy integration (scipy.stats.genexpon)#
SciPy provides scipy.stats.genexpon with shape parameters (a, b, c) plus loc and scale.
Common methods:
genexpon_sp.pdf(x, a, b, c, loc=..., scale=...)genexpon_sp.cdf(x, a, b, c, loc=..., scale=...)genexpon_sp.ppf(q, a, b, c, loc=..., scale=...)(uses a Lambert-W closed form)genexpon_sp.rvs(a, b, c, loc=..., scale=..., size=..., random_state=...)genexpon_sp.fit(data, ...)→ MLE for(a, b, c, loc, scale)(you can fixloc/scalewithfloc/fscale)
Tip: for extreme tails, logpdf, logcdf, logsf are usually more stable than pdf, cdf, sf.
# Match our NumPy-only PDF/CDF to SciPy on a grid
x = np.linspace(0, float(genexpon_sp.ppf(0.95, A0, B0, C0)), 25)
pdf_diff = np.max(np.abs(genexpon_pdf(x, A0, B0, C0) - genexpon_sp.pdf(x, A0, B0, C0)))
cdf_diff = np.max(np.abs(genexpon_cdf(x, A0, B0, C0) - genexpon_sp.cdf(x, A0, B0, C0)))
print("max |pdf - scipy|:", pdf_diff)
print("max |cdf - scipy|:", cdf_diff)
# Demonstrate rvs + fit
loc_true, scale_true = 0.8, 1.3
data = genexpon_sp.rvs(A0, B0, C0, loc=loc_true, scale=scale_true, size=2500, random_state=rng)
# Fitting all parameters can be unstable; in applications you often fix loc if the support start is known.
a_hat, b_hat, c_hat, loc_hat, scale_hat = genexpon_sp.fit(data)
print("\ntrue (a,b,c,loc,scale):", (A0, B0, C0, loc_true, scale_true))
print("fit (a,b,c,loc,scale):", (a_hat, b_hat, c_hat, loc_hat, scale_hat))
# Visualize fitted vs true PDF
x_grid = np.linspace(loc_true, np.quantile(data, 0.995), 600)
fig = go.Figure()
fig.add_trace(
go.Histogram(x=data, nbinsx=60, histnorm="probability density", name="data", opacity=0.45)
)
fig.add_trace(
go.Scatter(
x=x_grid,
y=genexpon_sp.pdf(x_grid, A0, B0, C0, loc=loc_true, scale=scale_true),
mode="lines",
name="true pdf",
)
)
fig.add_trace(
go.Scatter(
x=x_grid,
y=genexpon_sp.pdf(x_grid, a_hat, b_hat, c_hat, loc=loc_hat, scale=scale_hat),
mode="lines",
name="fitted pdf",
)
)
fig.update_layout(width=950, height=420, title="SciPy fit: true vs fitted PDF")
fig.update_xaxes(title_text="x")
fig.update_yaxes(title_text="density")
fig.show()
max |pdf - scipy|: 1.1102230246251565e-16
max |cdf - scipy|: 5.551115123125783e-17
true (a,b,c,loc,scale): (1.2, 0.7, 2.0, 0.8, 1.3)
fit (a,b,c,loc,scale): (0.7730376604261671, 0.5652789598973224, 1.2253635858863332, 0.8000577257220729, 0.8518451733244654)
10) Statistical use cases#
10.1 Hypothesis testing / goodness-of-fit#
A common workflow:
Fit parameters (MLE).
Check fit via the probability integral transform (PIT): $\(u_i = F(x_i;\hat\theta).\)\( If the model is correct, the \)u_i\( should look approximately \)\mathrm{Uniform}(0,1)$.
Use a uniformity test (e.g. KS test) with care: when parameters are estimated, classical p-values are not exact; use a parametric bootstrap if you need calibrated p-values.
10.2 Bayesian modeling#
In Bayesian survival modeling, you can place priors on positive parameters, e.g.
\(\log a,\log b,\log c \sim \mathcal{N}(0,\sigma^2)\) (log-normal priors for \(a,b,c\))
and use the genexpon log-likelihood.
10.3 Generative modeling#
Because sampling is straightforward (either via Lambert W or numerical inversion), genexpon is useful as a generative building block for waiting times in simulations:
simulate interarrival times
build renewal processes
stress-test systems with increasing-but-bounded hazard
# A simple goodness-of-fit workflow with PIT + a KS test
n = 1200
x = genexpon_sp.rvs(A0, B0, C0, size=n, random_state=rng)
# Fit the *standard* model (fix loc=0, scale=1) to focus on (a,b,c)
a_hat, b_hat, c_hat, _, _ = genexpon_sp.fit(x, floc=0, fscale=1)
u = genexpon_sp.cdf(x, a_hat, b_hat, c_hat, loc=0, scale=1)
ks = stats.kstest(u, "uniform")
print("fit (a,b,c):", (a_hat, b_hat, c_hat))
print("KS statistic:", ks.statistic)
print("KS p-value :", ks.pvalue)
print("(p-value is not calibrated when parameters are estimated; bootstrap if needed)")
# PIT histogram
fig = go.Figure()
fig.add_trace(go.Histogram(x=u, nbinsx=40, name="PIT u_i", opacity=0.75))
fig.update_layout(width=850, height=350, title="Probability integral transform (should look Uniform(0,1))")
fig.update_xaxes(title_text="u")
fig.update_yaxes(title_text="count")
fig.show()
# Model comparison example: exponential vs genexpon (AIC)
loc_e, scale_e = stats.expon.fit(x, floc=0)
ll_exp = np.sum(stats.expon.logpdf(x, loc=0, scale=scale_e))
ll_gen = np.sum(genexpon_sp.logpdf(x, a_hat, b_hat, c_hat, loc=0, scale=1))
k_exp = 1
k_gen = 3
print("\nAIC (lower is better)")
print("AIC exponential:", 2 * k_exp - 2 * ll_exp)
print("AIC genexpon :", 2 * k_gen - 2 * ll_gen)
fit (a,b,c): (1.1737074881086293, 0.8111183222726284, 1.4769073218547977)
KS statistic: 0.019322688734407012
KS p-value : 0.7538128001659254
(p-value is not calibrated when parameters are estimated; bootstrap if needed)
AIC (lower is better)
AIC exponential: 1345.9993175118516
AIC genexpon : 1325.120700479394
11) Pitfalls#
Invalid parameters: SciPy requires \(a,b,c>0\) (and
scale>0). If your model suggests \(b=0\) (pure exponential), fit an exponential directly.Identifiability / flat likelihood: \((b,c)\) can be weakly identified when \(c\) is extremely small or extremely large (different combinations can yield very similar hazards).
Fitting
loc/scale: if you know the support starts at 0, fixfloc=0(and oftenfscale=1) to reduce instability.Numerical underflow in tails:
pdf(x)may underflow to 0 for large \(x\); preferlogpdf,logsffor likelihood work.Sampling via numeric inversion: bisection is robust but can be slow for huge sample sizes; SciPy’s
ppfuses Lambert W and is faster.
12) Summary#
genexponis a continuous distribution on \([0,\infty)\) with hazard \(h(x)=a + b(1-e^{-cx})\) that increases from \(a\) to \(a+b\).It has an explicit survival function \(S(x)=\exp(-(a+b)x + (b/c)(1-e^{-cx}))\), so \(F(x)=1-S(x)\).
All moments exist; integer raw moments have a convenient convergent series: $\(\mathbb{E}[X^n]=n!e^{b/c}\sum_{k\ge 0}\frac{(-b/c)^k}{k!(a+b+ck)^n}.\)$
Inverse CDF has a closed form using Lambert W (SciPy uses it); NumPy-only sampling can be done by monotone bisection.
In practice,
scipy.stats.genexponprovidespdf/cdf/ppf/rvs/fitand is a handy alternative to an exponential model when hazard is increasing but bounded.